Load in some packages

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
theme_set(theme_bw(base_size = 18))
library(lavaan)
## This is lavaan 0.5-20
## lavaan is BETA software! Please report any bugs.
library(semPlot)
## Warning: running command ''/usr/bin/otool' -L '/Library/Frameworks/
## R.framework/Resources/library/tcltk/libs//tcltk.so'' had status 69
## Warning: replacing previous import by 'ggplot2::unit' when loading 'Hmisc'
## Warning: replacing previous import by 'ggplot2::arrow' when loading 'Hmisc'
## Warning: replacing previous import by 'scales::alpha' when loading 'Hmisc'
library(corrplot)

plot_matrix <- function(matrix_toplot){
corrplot(matrix_toplot, is.corr = FALSE, 
               type = 'lower', 
               order = "original", 
               tl.col='black', tl.cex=.75)
}

Brief intro to SEM

Structural equation modeling (SEM) can be a very useful tool in determining relationships between variables. Often SEM is used in a “confirmatory” manner, when determining whether a certain model is valid (i.e., comparing the goodness-of-fit of nested models). We can even extend SEM to study interactions across groups. SEM is sometimes referred to as causal modeling, path analysis (with latent variables), or covariances structure analysis. It subsumes a bunch of other techniques, like multiple regression, confirmatory factor analysis, ANOVA, etc.

You supply the observed relationship between variables (i.e., the covariance or correlation matrix), the # of observations, and a formal model specificiation, and SEM basically estimates parameters that will give you the “best” reproduction of the covariance matrix. The better your model fit, the better your reproduction of the covariance matrix (hence, lower chi-squared = better model)!

Mediation

For more information on how to conduct classic mediation with lavaan, check out the tutorial here.

Latent variables

Often we are interested in investigating latent, abstract variables (like “intelligence”) by obtaining multiple observable measures (e.g., high school GPA, SAT and ACT scores). Using SEM we can easily include latent variables!

Sample size

Source

SEM necessitates large sample sizes! In the literature, sample sizes commonly run 200 - 400 for models with 10 - 15 indicators. One survey of 72 SEM studies found the median sample size was 198. A sample of 150 is considered too small unless the covariance coefficients are relatively large. With over ten variables, sample size under 200 generally means parameter estimates are unstable and significance tests lack power.

One rule of thumb found in the literature is that sample size should be at least 50 more than 8 times the number of variables in the model. Mitchell (1993) advances the rule of thumb that there be 10 to 20 times as many cases as variables. Another rule of thumb, based on Stevens (1996), is to have at least 15 cases per measured variable or indicator. The researcher should go beyond these minimum sample size recommendations particularly when data are non-normal (skewed, kurtotic) or incomplete. Note also that to compute the asymptotic covariance matrix, one needs \(\frac{k(k+1)}{2}\) observations, where \(k\) is the number of variables.

Lavaan in action

Kievit, R. A., Davis, S. W., Mitchell, D. J., Taylor, J. R., Duncan, J., Henson, R. N., & Cam-CAN Research team. (2014). Distinct aspects of frontal lobe structure mediate age-related differences in fluid intelligence and multitasking. Nature communications, 5.

“Following best practice in SEM, we first report our measurement model on the full sample of N=567 (for further details on the sample and model fitting, see Methods). For this model, we hypothesize that two latent variables (fluid intelligence and multitasking; tasks shown in Fig. 1, see Methods for more detail) capture the covariance between the six behavioural variables described in the Methods section, freely estimating every factor loading. This model fits the data well, \(\chi^2\) = 15.40, degrees of freedom (df) = 8, P = 0.052, root mean square error of approximation (RMSEA) = 0.04 (0.00–0.070), comparative fit index (CFI) = 0.993, standardized root mean square residual (SRMR)=0.023, Satorra–Bentler scaling factor=1.009. As the two latent factors are positively correlated (standardized estimate = 0.325, Z = 6.17, P<0.001), we can ask whether a more parsimonious model with only a single cognitive factor (for example, ‘executive function’) shows better fit. Such a model would be compatible with a unitary perspective on the age-related decline of higher cognitive function. However, this one-factor model fits very poorly: \(\chi^2\) =334.149, df=9, P<0.00001, RMSEA=0.252 (0.231–0.275), CFI = 0.685, SRMR = 0.121, Satorra–Bentler scaling factor = 1.109, significantly worse than the two-factor model (\(\chi^2\) = 46.224, dfdiff = 1, P<0.00001).”

“For this reason, partial least squares is often considered a pragmatic alternative to the more principled, theory-driven SEM approach, which is why we here prefer the latter to test our a priori conceptualization

“SEM were fit using the package Lavaan in R, plots were generated using ggplot2 (ref. 69). We used the following guidelines for judging good fit: RMSEA<0.05 (acceptable: 0.05–0.08), CFI>0.97 (acceptable: 0.95–0.97) and SRMR<0.05 (acceptable: 0.05–0.10) and report the Satorra–Bentler scaling factor for each fitted model. All models were fit using Maximum Likelihood Estimation using robust s.e., and report overall model fit using the Satorra–Bentler scaled test statistic”

Lavaan steps & syntax

Brief dataset description

Here we have data on environmental, behavioral, genetic, and control variables. The main environmental risk was adolescent-onset use of cannabis (earlyon = 1/0, i.e., yes/no). The variable, schizo (= 1/0), indexes whether or not a person is diagnosed with schizophreniform disorder at age 26, and psysym is a self-report of the severity of psychotic symptoms at that age. The specific gene used to define the genotypes was the COMT gene, which has two alleles, namely, valine (V) and methionine (M), that affect the level of dopamine activity. We can treat “genotype” as a quantitative variable (COMT) with values 0 (for MM individuals), 1 (for VM) and 2 (for VV). Many control variables were measured, such as, chpsycq a quantitative measure of the severity of childhood psychotic symptoms.

Define your data!

Lavaan can take a correlation or covariance matrix (or even a dataframe) as input. Here, we have a covariance matrix. Remember to look at your data!

d = read.table("http://www.stanford.edu/class/psych253/data/comt-covar7.txt", sep = "\t")

lower = '
.183
.007 .500
.009 .001 .035
.556 .207 .018 13.025
.061 .010 -.007 .466 .164
.030 -.275 .096 .899 .216 25.646
.188 .134 .015 .695 .060 -.163 .321'

labels1 = c("EarlyOn", "COMT", "Schizo", "Psysym", "Conduct", "Chpsyq", "GenxEnv")
comt7.cov = getCov(lower, names = labels1); comt7.cov
##         EarlyOn   COMT Schizo Psysym Conduct Chpsyq GenxEnv
## EarlyOn   0.183  0.007  0.009  0.556   0.061  0.030   0.188
## COMT      0.007  0.500  0.001  0.207   0.010 -0.275   0.134
## Schizo    0.009  0.001  0.035  0.018  -0.007  0.096   0.015
## Psysym    0.556  0.207  0.018 13.025   0.466  0.899   0.695
## Conduct   0.061  0.010 -0.007  0.466   0.164  0.216   0.060
## Chpsyq    0.030 -0.275  0.096  0.899   0.216 25.646  -0.163
## GenxEnv   0.188  0.134  0.015  0.695   0.060 -0.163   0.321
plot_matrix(comt7.cov) 

Plan the model

Before running SEM, it’s helpful to plan out your hypothesized “model” of how the variables are related. For instance, run lm() or glm() models to get a sense for relationships! Better yet, have a priori hypotheses before collecting data (based off of the literature, etc.), and you can generate your model before you even have data!

It’s often helpful to sketch out your model beforehand, so grab a piece of paper (or find a whiteboard) for the homework, and use that to draw out your models! Remember that circles are used to describe latent variables, and squares for observed variables. Some predictor variables can be exogenous, meaning they are of “external origin” and their causes are not included in the model (i.e., no arrows are pointing towards it). The remaining variables are endogenous (e.g., a DV), meaning that they are the effects of other variables; in SEM, endogenous variables can also predict other variables!

Here we have a model that we want to recreate:

Define the model

As a note, remember that the numbers of parameters you can estimate depends on the number of variables! If you have k variables, you can estimate \(\frac{k*(k-1)}{2}\) parameters. For instance, if you have 3 variables, you can estimate \(\frac{3*2}{2} = 3\) parameters. If you have 6 variables, you can estimate \(\frac{6*5}{2} = 15\) parameters. Here, we have 7 variables, so we can estimate up to \(\frac{7*6}{2} = 21\) parameters (though we’ll probably need fewer!).

Similar to lm() and lmer(), etc., the model format has the DV on the left side of the operator, and the IV on the right side. In terms of the structural diagram, you could think of the arrow going from right to left. When the operator is a ~ this is regression, such that the DV(s) are “predicted by” the IV(s). When the operator is =~ we are defining a latent variable, reading it like L is “measured by” IVs. Variances/covariances are specified with a ~~ (e.g., DV is “correlated with” DV), and intercepts are simple regression formulas with 1 as the predictor. We include all these formulas inside single quotes.

  • Regressions
    • Y is predicted by X1 and X2: Y ~ X1 + X2
    • Y1 is predicted by X1, and Y2 is predicted by X1: Y1 + Y2 ~ X1
  • Latent Variables
    • Latent variable L is measured by variables U and V: L =~ U + V
  • Variances & covariances
    • The variance of X: X ~~ X
    • The covariance of X and Y1: X ~~ Y1
    • The covariance of X and Y2 is fixed at 0: X ~~ 0*Y2
  • Intercepts
    • The intercept of X: X ~ 1

For some practice generating the structural plot and equations, go here!

comt7.model1 = '
  Schizo ~ EarlyOn + GenxEnv + Psysym
  Psysym ~ EarlyOn + Conduct + Schizo
    Conduct ~ EarlyOn
    Chpsyq ~ COMT
'

In this model, we want to specify the following:

  1. Schizo is predicted by: EarlyOn, GenxEnv, Psysym
  2. Psysym is predicted by: EarlyOn, Conduct, Schizo
  3. Conduct is predicted by: EarlyOn
  4. Chpsyp is predicted by: COMT

Note that this model doesn’t include any definitions of latent variables, or explicit variances/covariances. However, the variances can still be estimated, and the covariances between exogenous variables can still be estimated, as we’ll see in a second!

Fit the model

Once we have the model specified, the covariance matrix all set, and we know the number of observations (n=803 in this case), we can actually fit the model!

Note that as long as we have fixed.x = F, the model will estimate the mean, variance, and covariance of your IVs as free parameters automatically. These parameters must be explicitly removed if you don’t want them.

comt7.fit1 = sem(comt7.model1, fixed.x = F, 
                 sample.cov = comt7.cov, 
                 sample.nobs = 803) 
## Found more than one class "Model" in cache; using the first, from namespace 'lavaan'
# To fit like in Kievit et al. (2014), use ML estimation with robust standard errors (only if you have a full dataset so can't actually run this here!)
# comt7.fit1b = sem(comt7.model1, fixed.x = F, 
#                  sample.cov = comt7.cov, 
#                  sample.nobs = 803,
#                  estimator = 'MLM') 
# # estimator = maximum likelihood estimation with robust standard errors and a Satorra-Bentler scaled test statistic. For complete data only.
# summary(comt7.fit1b, fit.measures = TRUE) # even more fit info

Analyze/visualize the model fit

#summary(comt7.fit1)
summary(comt7.fit1, fit.measures = TRUE) # even more fit info
## lavaan (0.5-20) converged normally after  83 iterations
## 
##   Number of observations                           803
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               33.578
##   Degrees of freedom                                10
##   P-value (Chi-square)                           0.000
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              336.588
##   Degrees of freedom                                18
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.926
##   Tucker-Lewis Index (TLI)                       0.867
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -6179.642
##   Loglikelihood unrestricted model (H1)      -6162.854
## 
##   Number of free parameters                         18
##   Akaike (AIC)                               12395.285
##   Bayesian (BIC)                             12479.675
##   Sample-size adjusted Bayesian (BIC)        12422.515
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.054
##   90 Percent Confidence Interval          0.035  0.075
##   P-value RMSEA <= 0.05                          0.333
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.034
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   Schizo ~                                            
##     EarlyOn           0.086    0.038    2.279    0.023
##     GenxEnv           0.084    0.024    3.577    0.000
##     Psysym           -0.041    0.010   -4.003    0.000
##   Psysym ~                                            
##     EarlyOn           1.438    0.433    3.322    0.001
##     Conduct           2.866    0.452    6.335    0.000
##     Schizo           13.109    3.301    3.971    0.000
##   Conduct ~                                           
##     EarlyOn           0.333    0.031   10.661    0.000
##   Chpsyq ~                                            
##     COMT             -0.550    0.252   -2.183    0.029
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   EarlyOn ~~                                          
##     GenxEnv           0.188    0.011   17.368    0.000
##     COMT              0.007    0.011    0.656    0.512
##   GenxEnv ~~                                          
##     COMT              0.134    0.015    8.989    0.000
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     Schizo            0.052    0.009    5.529    0.000
##     Psysym           16.318    2.973    5.488    0.000
##     Conduct           0.143    0.007   20.037    0.000
##     Chpsyq           25.463    1.271   20.037    0.000
##     EarlyOn           0.183    0.009   20.037    0.000
##     GenxEnv           0.321    0.016   20.037    0.000
##     COMT              0.499    0.025   20.037    0.000
# Just the parameter estimates
parameterEstimates(comt7.fit1)
##        lhs op     rhs    est    se      z pvalue ci.lower ci.upper
## 1   Schizo  ~ EarlyOn  0.086 0.038  2.279  0.023    0.012    0.160
## 2   Schizo  ~ GenxEnv  0.084 0.024  3.577  0.000    0.038    0.131
## 3   Schizo  ~  Psysym -0.041 0.010 -4.003  0.000   -0.061   -0.021
## 4   Psysym  ~ EarlyOn  1.438 0.433  3.322  0.001    0.590    2.287
## 5   Psysym  ~ Conduct  2.866 0.452  6.335  0.000    1.979    3.753
## 6   Psysym  ~  Schizo 13.109 3.301  3.971  0.000    6.639   19.579
## 7  Conduct  ~ EarlyOn  0.333 0.031 10.661  0.000    0.272    0.395
## 8   Chpsyq  ~    COMT -0.550 0.252 -2.183  0.029   -1.044   -0.056
## 9   Schizo ~~  Schizo  0.052 0.009  5.529  0.000    0.033    0.070
## 10  Psysym ~~  Psysym 16.318 2.973  5.488  0.000   10.491   22.146
## 11 Conduct ~~ Conduct  0.143 0.007 20.037  0.000    0.129    0.158
## 12  Chpsyq ~~  Chpsyq 25.463 1.271 20.037  0.000   22.972   27.954
## 13 EarlyOn ~~ EarlyOn  0.183 0.009 20.037  0.000    0.165    0.201
## 14 EarlyOn ~~ GenxEnv  0.188 0.011 17.368  0.000    0.167    0.209
## 15 EarlyOn ~~    COMT  0.007 0.011  0.656  0.512   -0.014    0.028
## 16 GenxEnv ~~ GenxEnv  0.321 0.016 20.037  0.000    0.289    0.352
## 17 GenxEnv ~~    COMT  0.134 0.015  8.989  0.000    0.105    0.163
## 18    COMT ~~    COMT  0.499 0.025 20.037  0.000    0.451    0.548
# The fitted cov matrix
fitted(comt7.fit1)
## $cov
##         Schizo Psysym Condct Chpsyq ErlyOn GnxEnv COMT  
## Schizo   0.035                                          
## Psysym   0.017 12.963                                   
## Conduct -0.008  0.453  0.164                            
## Chpsyq  -0.004 -0.062 -0.001 25.614                     
## EarlyOn  0.009  0.555  0.061 -0.004  0.183              
## GenxEnv  0.016  0.663  0.063 -0.074  0.188  0.321       
## COMT     0.007  0.113  0.002 -0.275  0.007  0.134  0.499
## 
## $mean
##  Schizo  Psysym Conduct  Chpsyq EarlyOn GenxEnv    COMT 
##       0       0       0       0       0       0       0
# Plot!
semPaths(comt7.fit1, what='std', 
         node.label.cex=5,
         edge.label.cex=1.25, curvePivot = TRUE, 
         fade=FALSE)

Looking at the model output, we can see that this model is not doing very well, since the estimated covariance matrix is significantly different from the actual covariance matrix, \(\chi^2\)(10) = 33.578, p < 0.001.

We can see, however, that we’re also doing a pretty good job predicting things. As predicted from our original causal diagram, adolescent-onset use of cannabis (EarlyOn) significantly predicts whether or not a person is diagnozed w/schizophreniform disorder at age 26 (Schizo), z=2.3, p < 0.05. Schizo is also significantly predicted by GenxEnv and Psysym (\(p\)s < 0.05). So that’s good! But, let’s take a better look at where we’re failing…

Assess where you’re messing up!

plot_matrix(residuals(comt7.fit1)$cov)

residuals(comt7.fit1)
## $type
## [1] "raw"
## 
## $cov
##         Schizo Psysym Condct Chpsyq ErlyOn GnxEnv COMT  
## Schizo   0.000                                          
## Psysym   0.001  0.045                                   
## Conduct  0.001  0.012  0.000                            
## Chpsyq   0.100  0.960  0.217  0.000                     
## EarlyOn  0.000  0.000  0.000  0.034  0.000              
## GenxEnv -0.001  0.032 -0.003 -0.089  0.000  0.000       
## COMT    -0.006  0.094  0.008  0.000  0.000  0.000  0.000
## 
## $mean
##  Schizo  Psysym Conduct  Chpsyq EarlyOn GenxEnv    COMT 
##       0       0       0       0       0       0       0

We might want to add in some relationship between Psysym and Chpsyq, or between Schizo and Chpsyq, or between Conduct and Chpsyq.

mi = modificationIndices(comt7.fit1)
head(mi[order(-mi$mi), ], 10)
##        lhs op     rhs     mi    epc sepc.lv sepc.all sepc.nox
## 26  Schizo  ~  Chpsyq 12.545  0.006   0.006    0.152    0.152
## 21  Schizo ~~  Chpsyq 12.361  0.142   0.142    0.150    0.150
## 24 Conduct ~~  Chpsyq  9.688  0.210   0.210    0.103    0.103
## 38  Chpsyq  ~ Conduct  9.423  1.351   1.351    0.108    0.108
## 33 Conduct  ~  Chpsyq  9.249  0.008   0.008    0.100    0.100
## 36  Chpsyq  ~  Schizo  8.381  2.756   2.756    0.102    0.102
## 50 GenxEnv  ~  Chpsyq  5.041 -0.005  -0.005   -0.043   -0.043
## 56    COMT  ~  Chpsyq  4.953  0.019   0.019    0.136    0.136
## 44 EarlyOn  ~  Chpsyq  4.357  0.004   0.004    0.043    0.043
## 30  Psysym  ~    COMT  2.498  0.328   0.328    0.064    0.091

Here the “mi” column gives a rough approximation of how the model would improve if new parameters were added. EPC (expected parameter change) is the value this parameter would have if you added it.

Update the model

Let’s add in Chpsyq predicting Schizo, and a covariance between Conduct and Chpsyq.

comt7.model2 = '
  Schizo ~ EarlyOn + GenxEnv + Psysym + Chpsyq
  Psysym ~ EarlyOn + Conduct + Schizo
  Conduct ~ EarlyOn
  Chpsyq ~ COMT
  Conduct ~~ Chpsyq
'

comt7.fit2 = sem(comt7.model2, sample.cov = comt7.cov, sample.nobs = 803)
summary(comt7.fit2)
## lavaan (0.5-20) converged normally after  79 iterations
## 
##   Number of observations                           803
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               11.007
##   Degrees of freedom                                 8
##   P-value (Chi-square)                           0.201
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   Schizo ~                                            
##     EarlyOn           0.073    0.035    2.064    0.039
##     GenxEnv           0.091    0.023    3.961    0.000
##     Psysym           -0.039    0.009   -4.334    0.000
##     Chpsyq            0.006    0.002    3.399    0.001
##   Psysym ~                                            
##     EarlyOn           1.479    0.408    3.621    0.000
##     Conduct           2.827    0.422    6.704    0.000
##     Schizo           12.545    2.936    4.273    0.000
##   Conduct ~                                           
##     EarlyOn           0.332    0.031   10.676    0.000
##   Chpsyq ~                                            
##     COMT             -0.572    0.250   -2.286    0.022
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   Conduct ~~                                          
##     Chpsyq            0.210    0.068    3.097    0.002
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     Schizo            0.049    0.008    6.276    0.000
##     Psysym           15.841    2.556    6.197    0.000
##     Conduct           0.143    0.007   20.037    0.000
##     Chpsyq           25.463    1.271   20.037    0.000

Check if you improved

anova(comt7.fit2, comt7.fit1)
## Chi Square Difference Test
## 
##            Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
## comt7.fit2  8 12365 12430 11.007                                  
## comt7.fit1 10 12395 12480 33.578     22.571       2  1.256e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This new model is doing significantly better than our first model! We have a lower AIC value, and our chi-squared is significantly lower! That means we’re doing a better job reproducing the covariance matrix!

Check out where we’re still messing up…

plot_matrix(residuals(comt7.fit2)$cov)

mi = modificationIndices(comt7.fit2)
head(mi[order(-mi$mi), ], 10)
##        lhs op     rhs    mi    epc sepc.lv sepc.all sepc.nox
## 56    COMT  ~  Chpsyq 5.240  0.019   0.019    0.134    0.134
## 50 GenxEnv  ~  Chpsyq 5.008 -0.005  -0.005   -0.043   -0.043
## 44 EarlyOn  ~  Chpsyq 4.329  0.004   0.004    0.043    0.043
## 30  Psysym  ~    COMT 2.560  0.325   0.325    0.064    0.090
## 28  Psysym  ~  Chpsyq 2.147 -0.044  -0.044   -0.062   -0.062
## 24  Psysym ~~ Conduct 1.806  0.703   0.703    0.482    0.482
## 25  Psysym ~~  Chpsyq 1.806 -1.030  -1.030   -0.056   -0.056
## 29  Psysym  ~ GenxEnv 1.722  0.575   0.575    0.090    0.159
## 37  Chpsyq  ~  Psysym 1.461 -0.078  -0.078   -0.056   -0.056
## 55    COMT  ~ Conduct 1.394  0.063   0.063    0.036    0.036

Update the model…again…

Now let’s add in a parameter to see whether severity of childhood psychotic symptoms (Chpsyq) predicts severity of psychotic symptoms (Psysym), since we might be missing something there!

comt7.model3 = '
  Schizo ~ EarlyOn + GenxEnv + Psysym + Chpsyq
  Psysym ~ EarlyOn + Conduct + Schizo + Chpsyq
  Conduct ~ EarlyOn
  Chpsyq ~ COMT
  Conduct ~~ Chpsyq
'

comt7.fit3 = sem(comt7.model3, sample.cov = comt7.cov, sample.nobs = 803)
summary(comt7.fit3)
## lavaan (0.5-20) converged normally after  80 iterations
## 
##   Number of observations                           803
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic                8.908
##   Degrees of freedom                                 7
##   P-value (Chi-square)                           0.259
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   Schizo ~                                            
##     EarlyOn           0.082    0.038    2.192    0.028
##     GenxEnv           0.096    0.024    3.926    0.000
##     Psysym           -0.044    0.010   -4.247    0.000
##     Chpsyq            0.006    0.002    3.486    0.000
##   Psysym ~                                            
##     EarlyOn           1.345    0.447    3.009    0.003
##     Conduct           3.006    0.477    6.309    0.000
##     Schizo           14.198    3.401    4.174    0.000
##     Chpsyq           -0.045    0.033   -1.359    0.174
##   Conduct ~                                           
##     EarlyOn           0.332    0.031   10.676    0.000
##   Chpsyq ~                                            
##     COMT             -0.572    0.250   -2.286    0.022
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   Conduct ~~                                          
##     Chpsyq            0.210    0.068    3.097    0.002
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     Schizo            0.053    0.010    5.346    0.000
##     Psysym           17.251    3.279    5.261    0.000
##     Conduct           0.143    0.007   20.037    0.000
##     Chpsyq           25.463    1.271   20.037    0.000
anova(comt7.fit2, comt7.fit3)
## Chi Square Difference Test
## 
##            Df   AIC   BIC   Chisq Chisq diff Df diff Pr(>Chisq)
## comt7.fit3  7 12365 12435  8.9079                              
## comt7.fit2  8 12365 12430 11.0071     2.0992       1     0.1474
# plot_matrix(residuals(comt7.fit3)$cov)

Here we didn’t significantly decrease the \(\chi^2\), and we added more predictors, so let’s stick with the more parsimonious model.

What if we explicitly remove the covariance between GenxEnv and COMT?

comt7.model4 = '
  Schizo ~ EarlyOn + GenxEnv + Psysym + Chpsyq
  Psysym ~ EarlyOn + Conduct + Schizo + Chpsyq
  Conduct ~ EarlyOn
  Chpsyq ~ COMT
  Conduct ~~ Chpsyq
  GenxEnv ~~ 0*COMT
'

comt7.fit4 = sem(comt7.model3, sample.cov = comt7.cov, sample.nobs = 803)
summary(comt7.fit4)
## lavaan (0.5-20) converged normally after  80 iterations
## 
##   Number of observations                           803
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic                8.908
##   Degrees of freedom                                 7
##   P-value (Chi-square)                           0.259
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   Schizo ~                                            
##     EarlyOn           0.082    0.038    2.192    0.028
##     GenxEnv           0.096    0.024    3.926    0.000
##     Psysym           -0.044    0.010   -4.247    0.000
##     Chpsyq            0.006    0.002    3.486    0.000
##   Psysym ~                                            
##     EarlyOn           1.345    0.447    3.009    0.003
##     Conduct           3.006    0.477    6.309    0.000
##     Schizo           14.198    3.401    4.174    0.000
##     Chpsyq           -0.045    0.033   -1.359    0.174
##   Conduct ~                                           
##     EarlyOn           0.332    0.031   10.676    0.000
##   Chpsyq ~                                            
##     COMT             -0.572    0.250   -2.286    0.022
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   Conduct ~~                                          
##     Chpsyq            0.210    0.068    3.097    0.002
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     Schizo            0.053    0.010    5.346    0.000
##     Psysym           17.251    3.279    5.261    0.000
##     Conduct           0.143    0.007   20.037    0.000
##     Chpsyq           25.463    1.271   20.037    0.000
anova(comt7.fit2, comt7.fit4)
## Chi Square Difference Test
## 
##            Df   AIC   BIC   Chisq Chisq diff Df diff Pr(>Chisq)
## comt7.fit4  7 12365 12435  8.9079                              
## comt7.fit2  8 12365 12430 11.0071     2.0992       1     0.1474
# plot_matrix(residuals(comt7.fit3)$cov)

Here, note that since I’ve explicitly specified covariance in this model (or lack thereof), it overrides the default behavior, so I have to manually define the other covariances if I want to keep them!

Try other variations to get the best model!

Question B: Adding regression coefficients

Here, Sadler & Woody (2003) test an interpersonal theory of dyadic interaction: Person M’s ‘behavior’ when interacting with person F (i.e., M’s situational behavior) is influenced by (i) M’s long-term tendency to elicit the behavior (i.e., M’s behavioral trait), and (ii) F’s situational behavior. The same is assumed true for F.

The four variables, trait and behavior for each of M and F, are represented as latent variables that can be measured in multiple ways. “Reciprocal influence” is conceptualized as M’s situational behavior influencing F’s situational behavior, and vice versa. In this study, M and F were strangers, so M’s trait (which is unknown to F) couldn’t affect F’s situational behavior. However, if M and F were friends, such links would be expected to be significant and should be included in the model. Trait is measured by self-report and a friend’s report; and situational behavior is rated by self, partner and an observer. Rater biases are introduced through the use of covariance arrows.

Variable coding:

  1. Female or male (M/F)
  2. Situational or trait (S/T)
  3. Observer, partner, friend, or self rating (O/I/F/S)
  4. Rating (R)

Load in data

low.dom = '
1
.415 1
.460 .351 1
-.321 -.374 -.310 1
-.239 -.221 -.133 .626 1
-.185 -.164 -.272 .533 .345 1
.349 .307 .496 -.180 -.081 -.067 1
.308 .302 .562 -.222 -.156 -.118 .573 1
.038 -.115 -.048 .296 .167 .320 .008 -.036 1
-.072 -.160 -.124 .317 .167 .248 -.152 -.175 .414 1 
'
labels1 = c("FSOR", "FSIR", "FSSR", "MSOR", "MSIR", "MSSR", "FTFR", "FTSR", "MTSR", "MTFR")
dom.cov = getCov(low.dom, names = labels1)
plot_matrix(dom.cov)

Unconstrained paths

Here, we have latent variables that are measured by observed variables. Specifically, female trait dominance (FTZ) is measured by friends and self trait ratings (FTSR and FTFT). Similarly, female situational dominance (FSZ) is measured by self, partner, and observers’ ratings. The same is true for males. Thus, we’ll have to define these latent variables in the model, and then see how these latent variables are predicted by other latent variables!

We want to estimate the paths A and B (the effect of the individuals’ traits on their situational behavior) and C and D (the effects of individuals’ situational behavior on each other). Let’s first allow these paths to be unconstrained (i.e., we can get a different estimate for each path).

Using the * operator below “names” the regression coefficient. This is useful mainly because if we give to coefficients the same name, the function automatically constrains them to be the same.

dyad.model1 = '
  # Latent variable definitions for trait, FTZ & MTZ, and situation, FSZ & MSZ  
    FTZ =~ FTSR + FTFR
    MTZ =~ MTSR + MTFR
    FSZ =~ FSSR + FSIR + FSOR
    MSZ =~ MSSR + MSIR + MSOR
    
    # regressions
    FSZ ~ B*FTZ + C*MSZ
    MSZ ~ A*MTZ + D*FSZ
     
    # variances and covariances
    # Residual correls among M ratings, MTSR, MSSR, FSIR
    MTSR + MSSR ~~ FSIR
    MSSR ~~ MTSR
    
    # Residual correls among F ratings, FTSR, FSSR, MSIR
    FTSR + FSSR ~~ MSIR
    FSSR ~~ FTSR
    
    # Residual correls among Observer situational ratings, MSOR, FSOR   
    MSOR ~~ FSOR    
'
dyad.fit1 = sem(dyad.model1, fixed.x = F, sample.cov = dom.cov, sample.nobs = 112)
summary(dyad.fit1) #good p value
## lavaan (0.5-20) converged normally after  41 iterations
## 
##   Number of observations                           112
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               17.306
##   Degrees of freedom                                23
##   P-value (Chi-square)                           0.794
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   FTZ =~                                              
##     FTSR              1.000                           
##     FTFR              1.120    0.231    4.858    0.000
##   MTZ =~                                              
##     MTSR              1.000                           
##     MTFR              1.129    0.382    2.952    0.003
##   FSZ =~                                              
##     FSSR              1.000                           
##     FSIR              0.833    0.167    4.984    0.000
##     FSOR              0.928    0.175    5.297    0.000
##   MSZ =~                                              
##     MSSR              1.000                           
##     MSIR              1.127    0.207    5.435    0.000
##     MSOR              1.678    0.302    5.563    0.000
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   FSZ ~                                               
##     FTZ        (B)    0.696    0.136    5.116    0.000
##     MSZ        (C)   -0.283    0.160   -1.769    0.077
##   MSZ ~                                               
##     MTZ        (A)    0.418    0.152    2.741    0.006
##     FSZ        (D)   -0.231    0.115   -2.008    0.045
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   MTSR ~~                                             
##     FSIR             -0.044    0.073   -0.599    0.549
##   FSIR ~~                                             
##     MSSR              0.058    0.069    0.838    0.402
##   MTSR ~~                                             
##     MSSR              0.130    0.073    1.785    0.074
##   FTSR ~~                                             
##     MSIR             -0.028    0.059   -0.474    0.635
##   FSSR ~~                                             
##     MSIR              0.074    0.061    1.213    0.225
##   FTSR ~~                                             
##     FSSR              0.147    0.074    1.985    0.047
##   FSOR ~~                                             
##     MSOR              0.008    0.057    0.144    0.886
##   FTZ ~~                                              
##     MTZ              -0.073    0.062   -1.187    0.235
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     FTSR              0.485    0.111    4.380    0.000
##     FTFR              0.375    0.124    3.025    0.002
##     MTSR              0.629    0.142    4.422    0.000
##     MTFR              0.545    0.163    3.342    0.001
##     FSSR              0.498    0.098    5.070    0.000
##     FSIR              0.661    0.105    6.291    0.000
##     FSOR              0.571    0.101    5.665    0.000
##     MSSR              0.677    0.099    6.842    0.000
##     MSIR              0.571    0.091    6.280    0.000
##     MSOR              0.088    0.106    0.837    0.403
##     FTZ               0.491    0.143    3.419    0.001
##     MTZ               0.350    0.152    2.310    0.021
##     FSZ               0.158    0.073    2.167    0.030
##     MSZ               0.189    0.065    2.920    0.004
semPaths(dyad.fit1, what='std', 
         node.label.cex=5,
         edge.label.cex=1.25, curvePivot = TRUE, 
         fade=FALSE)

This model is ok, but let’s see if we can get a simpler model that works! Let’s try taking out most of the covariance terms.

dyad.model2 = '
  # Latent variable definitions for trait, FTZ & MTZ, and situation, FSZ & MSZ  
    FTZ =~ FTSR + FTFR
    MTZ =~ MTSR + MTFR
    FSZ =~ FSSR + FSIR + FSOR
    MSZ =~ MSSR + MSIR + MSOR
    
    # regressions
    FSZ ~ B*FTZ + C*MSZ
    MSZ ~ A*MTZ + D*FSZ
     
    # Residual correls among M ratings, MTSR, MSSR
    MSSR ~~ MTSR
    
    # Residual correls among F ratings, FTSR, FSSR
    FSSR ~~ FTSR
'
dyad.fit2 = sem(dyad.model2, fixed.x = F, sample.cov = dom.cov, sample.nobs = 112) 
summary(dyad.fit2) # Wow, even better !
## lavaan (0.5-20) converged normally after  38 iterations
## 
##   Number of observations                           112
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               20.970
##   Degrees of freedom                                28
##   P-value (Chi-square)                           0.827
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   FTZ =~                                              
##     FTSR              1.000                           
##     FTFR              1.122    0.231    4.862    0.000
##   MTZ =~                                              
##     MTSR              1.000                           
##     MTFR              1.110    0.370    2.996    0.003
##   FSZ =~                                              
##     FSSR              1.000                           
##     FSIR              0.818    0.167    4.899    0.000
##     FSOR              0.916    0.172    5.321    0.000
##   MSZ =~                                              
##     MSSR              1.000                           
##     MSIR              1.170    0.220    5.308    0.000
##     MSOR              1.804    0.338    5.334    0.000
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   FSZ ~                                               
##     FTZ        (B)    0.713    0.137    5.194    0.000
##     MSZ        (C)   -0.281    0.158   -1.778    0.075
##   MSZ ~                                               
##     MTZ        (A)    0.392    0.145    2.711    0.007
##     FSZ        (D)   -0.209    0.107   -1.949    0.051
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   MTSR ~~                                             
##     MSSR              0.133    0.073    1.823    0.068
##   FTSR ~~                                             
##     FSSR              0.145    0.075    1.937    0.053
##   FTZ ~~                                              
##     MTZ              -0.074    0.062   -1.196    0.232
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     FTSR              0.486    0.110    4.401    0.000
##     FTFR              0.375    0.124    3.029    0.002
##     MTSR              0.624    0.142    4.380    0.000
##     MTFR              0.550    0.160    3.445    0.001
##     FSSR              0.498    0.098    5.073    0.000
##     FSIR              0.663    0.105    6.316    0.000
##     FSOR              0.580    0.100    5.782    0.000
##     MSSR              0.697    0.100    6.978    0.000
##     MSIR              0.590    0.092    6.384    0.000
##     MSOR              0.036    0.113    0.319    0.750
##     FTZ               0.489    0.143    3.415    0.001
##     MTZ               0.358    0.153    2.339    0.019
##     FSZ               0.156    0.072    2.168    0.030
##     MSZ               0.179    0.061    2.918    0.004
anova(dyad.fit1,dyad.fit2)
## Chi Square Difference Test
## 
##           Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## dyad.fit1 23 2925.1 3012.1 17.306                              
## dyad.fit2 28 2918.8 2992.2 20.970     3.6634       5     0.5988

This simpler model is definitely the better one! We have more dfs (fewer parameters estimated), a lower AIC, and then \(\chi^2\) isn’t significantly worse!

Constrained paths

Let’s see if we need our “C” and “D” paths to be different. We have 1 less parameter to estimate if we constrain them to be the same, so it would be more parsimonious if we can have a constrained model!

dyad.model3 = '
  # Latent variable definitions for trait, FTZ & MTZ, and situation, FSZ & MSZ  
    FTZ =~ FTSR + FTFR
    MTZ =~ MTSR + MTFR
    FSZ =~ FSSR + FSIR + FSOR
    MSZ =~ MSSR + MSIR + MSOR
    
    # regressions
    FSZ ~ B*FTZ + C*MSZ
    MSZ ~ A*MTZ + C*FSZ
     
    # Residual correls among M ratings, MTSR, MSSR
    MSSR ~~ MTSR
    
    # Residual correls among F ratings, FTSR, FSSR
    FSSR ~~ FTSR
'

dyad.fit3 = sem(dyad.model3, fixed.x = T, sample.cov = dom.cov, sample.nobs = 112) #notice how i set fixed.x = T here so it wouldn't automatically add terms I didn't want
summary(dyad.fit3)
## lavaan (0.5-20) converged normally after  35 iterations
## 
##   Number of observations                           112
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               21.055
##   Degrees of freedom                                29
##   P-value (Chi-square)                           0.857
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   FTZ =~                                              
##     FTSR              1.000                           
##     FTFR              1.118    0.229    4.881    0.000
##   MTZ =~                                              
##     MTSR              1.000                           
##     MTFR              1.091    0.365    2.991    0.003
##   FSZ =~                                              
##     FSSR              1.000                           
##     FSIR              0.823    0.167    4.918    0.000
##     FSOR              0.925    0.172    5.366    0.000
##   MSZ =~                                              
##     MSSR              1.000                           
##     MSIR              1.155    0.211    5.484    0.000
##     MSOR              1.775    0.316    5.625    0.000
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   FSZ ~                                               
##     FTZ        (B)    0.714    0.138    5.186    0.000
##     MSZ        (C)   -0.237    0.061   -3.874    0.000
##   MSZ ~                                               
##     MTZ        (A)    0.390    0.144    2.717    0.007
##     FSZ        (C)   -0.237    0.061   -3.874    0.000
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   MTSR ~~                                             
##     MSSR              0.132    0.073    1.819    0.069
##   FTSR ~~                                             
##     FSSR              0.148    0.074    1.985    0.047
##   FTZ ~~                                              
##     MTZ              -0.073    0.062   -1.167    0.243
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     FTSR              0.485    0.110    4.412    0.000
##     FTFR              0.380    0.123    3.088    0.002
##     MTSR              0.617    0.144    4.283    0.000
##     MTFR              0.556    0.159    3.502    0.000
##     FSSR              0.499    0.098    5.081    0.000
##     FSIR              0.663    0.105    6.305    0.000
##     FSOR              0.576    0.100    5.742    0.000
##     MSSR              0.695    0.100    6.959    0.000
##     MSIR              0.588    0.092    6.390    0.000
##     MSOR              0.040    0.111    0.358    0.720
##     FTZ               0.489    0.143    3.429    0.001
##     MTZ               0.366    0.156    2.352    0.019
##     FSZ               0.161    0.072    2.237    0.025
##     MSZ               0.182    0.061    2.996    0.003
anova(dyad.fit1,dyad.fit2,dyad.fit3) #looks like constraining it made the model better! So they probably aren't actually different
## Chi Square Difference Test
## 
##           Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## dyad.fit1 23 2925.1 3012.1 17.306                              
## dyad.fit2 28 2918.8 2992.2 20.970     3.6634       5     0.5988
## dyad.fit3 29 2916.8 2987.5 21.055     0.0850       1     0.7707
#could also test it this way:
dyad.fit3a = sem(dyad.model2, fixed.x = F, sample.cov = dom.cov, sample.nobs = 112, constraints = 'C - D == 0', start = dyad.fit2) #using our model2 design, but constraining C-D to be 0 (i.e., C = D)
# summary(dyad.fit3a) #same result as before

Question C: Group Comparisons

For this question, we assess Prof Jeanne Tsai’s “Affect Valuation Theory” (AVT), about the antecedents and consequences of ideal affect, and how it might vary across cultures. Specifically, this theory hypothesizes that culture influences ideal affect more than actual affect, whereas temperament/personality influences actual affect more than ideal affect. Further, engaging in rigorous activities is more strongly predicted by ideal affect, but depression is more predicted by actual affect. In addition, culture (Asian American, Hong Kong Chinese, and European American / AA, CH, EA) might influences these processes.

To start out, let’s specify the model for the whole sample, ignoring ‘group’.

d = read.csv("http://www.stanford.edu/class/psych253/data/jt-data1.csv")

avt.model1 = '
  # regressions 
    ideahap + actuhap ~ cultatt + temperatt
    rigoract + depress ~ ideahap + actuhap + cultatt + temperatt
    
    # variances and covariances
    rigoract ~~ 0*depress
'

avt.fit1 = sem(avt.model1, fixed.x = F, data = d)
summary(avt.fit1)
## lavaan (0.5-20) converged normally after  62 iterations
## 
##                                                   Used       Total
##   Number of observations                           215         239
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic                9.547
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.008
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   ideahap ~                                           
##     cultatt           0.075    0.022    3.455    0.001
##     temperatt         0.131    0.031    4.260    0.000
##   actuhap ~                                           
##     cultatt           0.111    0.035    3.165    0.002
##     temperatt         0.354    0.049    7.188    0.000
##   rigoract ~                                          
##     ideahap           0.508    0.262    1.939    0.053
##     actuhap          -0.071    0.163   -0.436    0.663
##     cultatt          -0.105    0.088   -1.193    0.233
##     temperatt         0.314    0.136    2.318    0.020
##   depress ~                                           
##     ideahap           1.687    1.724    0.979    0.328
##     actuhap          -1.646    1.073   -1.534    0.125
##     cultatt           0.401    0.577    0.696    0.487
##     temperatt        -6.187    0.891   -6.942    0.000
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   rigoract ~~                                         
##     depress           0.000                           
##   cultatt ~~                                          
##     temperatt         0.280    0.057    4.879    0.000
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     ideahap           0.100    0.010   10.368    0.000
##     actuhap           0.257    0.025   10.368    0.000
##     rigoract          1.473    0.142   10.368    0.000
##     depress          63.595    6.134   10.368    0.000
##     cultatt           1.118    0.108   10.368    0.000
##     temperatt         0.563    0.054   10.368    0.000

Model 1 is almost saturated - only 2 df is left for testing - yet it is poor. This suggests that many of the paths are unhelpful, i.e., have coeffs near 0. These unhelpful paths should be removed and replaced by useful paths. I was surprised that sem() estimated cov(rigoract, depress), because these vars are not exogenous. It appears that sem(), by default, puts a cov path between any pair of variables that are not already linked in the model! This is a waste of params, so explicitly set these unwanted cov paths equal to 0.

A common finding in this field is that ideal HAP and actual HAP are correlated. So introduce a cov link between them.

#This model removes the paths with low p-values, and adds the new covariance
avt.model2 = '
  # regressions
    # X + Y ~ U + V + W
    
    ideahap + actuhap ~ cultatt + temperatt
    rigoract ~ ideahap + temperatt
    depress ~ actuhap + temperatt
    
    # variances and covariances
    # X ~~ Y
    rigoract ~~ 0*depress
    ideahap ~~ actuhap
'
avt.fit2 = sem(avt.model2, fixed.x = F, data = d)
summary(avt.fit2) #much better!    
## lavaan (0.5-20) converged normally after  45 iterations
## 
##                                                   Used       Total
##   Number of observations                           215         239
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic                3.556
##   Degrees of freedom                                 5
##   P-value (Chi-square)                           0.615
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   ideahap ~                                           
##     cultatt           0.075    0.022    3.455    0.001
##     temperatt         0.131    0.031    4.260    0.000
##   actuhap ~                                           
##     cultatt           0.111    0.035    3.165    0.002
##     temperatt         0.354    0.049    7.188    0.000
##   rigoract ~                                          
##     ideahap           0.407    0.256    1.588    0.112
##     temperatt         0.250    0.119    2.107    0.035
##   depress ~                                           
##     actuhap          -1.224    1.053   -1.162    0.245
##     temperatt        -5.877    0.845   -6.953    0.000
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   rigoract ~~                                         
##     depress           0.000                           
##   ideahap ~~                                          
##     actuhap           0.033    0.011    2.991    0.003
##   cultatt ~~                                          
##     temperatt         0.280    0.057    4.879    0.000
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     ideahap           0.100    0.010   10.368    0.000
##     actuhap           0.257    0.025   10.368    0.000
##     rigoract          1.485    0.143   10.368    0.000
##     depress          64.104    6.183   10.368    0.000
##     cultatt           1.118    0.108   10.368    0.000
##     temperatt         0.563    0.054   10.368    0.000
anova(avt.fit1, avt.fit2)
## Chi Square Difference Test
## 
##          Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## avt.fit1  2 3758.8 3822.8 9.5469                              
## avt.fit2  5 3746.8 3800.7 3.5562    -5.9907       3          1

Group analysis

Now, we might wonder if the model we specified above applies to any one of the three groups (EA, AA, and CH).

grplab1 = c("EA", "AA", "CH")

avt.group1 = sem(avt.model2, fixed.x = F, data = d, group = "group", meanstructure = F)
summary(avt.group1)
## lavaan (0.5-20) converged normally after 130 iterations
## 
##                                                   Used       Total
##   Number of observations per group         
##   1                                                 65          75
##   2                                                 67          80
##   3                                                 83          84
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               15.149
##   Degrees of freedom                                15
##   P-value (Chi-square)                           0.441
## 
## Chi-square for each group:
## 
##   1                                              9.798
##   2                                              2.776
##   3                                              2.574
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## 
## Group 1 [1]:
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   ideahap ~                                           
##     cultatt           0.062    0.037    1.661    0.097
##     temperatt         0.172    0.066    2.626    0.009
##   actuhap ~                                           
##     cultatt           0.005    0.054    0.095    0.925
##     temperatt         0.574    0.094    6.079    0.000
##   rigoract ~                                          
##     ideahap           0.687    0.627    1.096    0.273
##     temperatt         0.309    0.352    0.878    0.380
##   depress ~                                           
##     actuhap          -0.280    2.128   -0.132    0.895
##     temperatt        -5.327    1.998   -2.666    0.008
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   rigoract ~~                                         
##     depress           0.000                           
##   ideahap ~~                                          
##     actuhap           0.014    0.016    0.861    0.390
##   cultatt ~~                                          
##     temperatt         0.138    0.076    1.819    0.069
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     ideahap           0.090    0.016    5.701    0.000
##     actuhap           0.187    0.033    5.701    0.000
##     rigoract          2.394    0.420    5.701    0.000
##     depress          55.005    9.649    5.701    0.000
##     cultatt           1.047    0.184    5.701    0.000
##     temperatt         0.340    0.060    5.701    0.000
## 
## 
## Group 2 [2]:
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   ideahap ~                                           
##     cultatt           0.007    0.028    0.245    0.806
##     temperatt         0.130    0.041    3.205    0.001
##   actuhap ~                                           
##     cultatt           0.121    0.059    2.041    0.041
##     temperatt         0.454    0.085    5.327    0.000
##   rigoract ~                                          
##     ideahap           0.593    0.559    1.061    0.289
##     temperatt         0.014    0.189    0.072    0.943
##   depress ~                                           
##     actuhap          -4.514    2.094   -2.155    0.031
##     temperatt        -4.544    1.774   -2.562    0.010
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   rigoract ~~                                         
##     depress           0.000                           
##   ideahap ~~                                          
##     actuhap           0.037    0.013    2.757    0.006
##   cultatt ~~                                          
##     temperatt         0.265    0.096    2.766    0.006
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     ideahap           0.049    0.009    5.788    0.000
##     actuhap           0.218    0.038    5.788    0.000
##     rigoract          1.035    0.179    5.788    0.000
##     depress          68.089   11.764    5.788    0.000
##     cultatt           1.063    0.184    5.788    0.000
##     temperatt         0.514    0.089    5.788    0.000
## 
## 
## Group 3 [3]:
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   ideahap ~                                           
##     cultatt           0.093    0.044    2.122    0.034
##     temperatt         0.061    0.053    1.152    0.250
##   actuhap ~                                           
##     cultatt           0.198    0.068    2.904    0.004
##     temperatt         0.204    0.082    2.479    0.013
##   rigoract ~                                          
##     ideahap           0.118    0.315    0.375    0.708
##     temperatt         0.299    0.157    1.911    0.056
##   depress ~                                           
##     actuhap          -0.079    1.496   -0.053    0.958
##     temperatt        -7.142    1.221   -5.848    0.000
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   rigoract ~~                                         
##     depress           0.000                           
##   ideahap ~~                                          
##     actuhap           0.044    0.022    2.023    0.043
##   cultatt ~~                                          
##     temperatt         0.092    0.073    1.252    0.211
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     ideahap           0.125    0.019    6.442    0.000
##     actuhap           0.303    0.047    6.442    0.000
##     rigoract          1.087    0.169    6.442    0.000
##     depress          61.947    9.616    6.442    0.000
##     cultatt           0.799    0.124    6.442    0.000
##     temperatt         0.547    0.085    6.442    0.000

The fit to each group is acceptable when the parameters were free to vary across groups, except that the fit to Group 1 is almost significant (\(\chi^2\) = 9.8 with 5 df, p = 0.08).

What if we were to equate params across groups?

avt.group2 = sem(avt.model2, fixed.x = F, data = d, group = "group", meanstructure = F, group.equal = "regressions")
summary(avt.group2)
## lavaan (0.5-20) converged normally after  73 iterations
## 
##                                                   Used       Total
##   Number of observations per group         
##   1                                                 65          75
##   2                                                 67          80
##   3                                                 83          84
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               38.476
##   Degrees of freedom                                31
##   P-value (Chi-square)                           0.167
## 
## Chi-square for each group:
## 
##   1                                             18.238
##   2                                              8.749
##   3                                             11.490
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## 
## Group 1 [1]:
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   ideahap ~                                           
##     cultatt (.p1.)    0.037    0.020    1.853    0.064
##     temprtt (.p2.)    0.115    0.029    3.991    0.000
##   actuhap ~                                           
##     cultatt (.p3.)    0.106    0.035    3.015    0.003
##     temprtt (.p4.)    0.402    0.051    7.829    0.000
##   rigoract ~                                          
##     ideahap (.p5.)    0.300    0.253    1.185    0.236
##     temprtt (.p6.)    0.205    0.114    1.795    0.073
##   depress ~                                           
##     actuhap (.p7.)   -1.041    1.046   -0.996    0.319
##     temprtt (.p8.)   -6.240    0.911   -6.850    0.000
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   rigoract ~~                                         
##     depress           0.000                           
##   ideahap ~~                                          
##     actuhap           0.014    0.017    0.846    0.397
##   cultatt ~~                                          
##     temperatt         0.138    0.076    1.819    0.069
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     ideahap           0.092    0.016    5.701    0.000
##     actuhap           0.203    0.036    5.701    0.000
##     rigoract          2.419    0.424    5.701    0.000
##     depress          55.735    9.777    5.701    0.000
##     cultatt           1.047    0.184    5.701    0.000
##     temperatt         0.340    0.060    5.701    0.000
## 
## 
## Group 2 [2]:
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   ideahap ~                                           
##     cultatt (.p1.)    0.037    0.020    1.853    0.064
##     temprtt (.p2.)    0.115    0.029    3.991    0.000
##   actuhap ~                                           
##     cultatt (.p3.)    0.106    0.035    3.015    0.003
##     temprtt (.p4.)    0.402    0.051    7.829    0.000
##   rigoract ~                                          
##     ideahap (.p5.)    0.300    0.253    1.185    0.236
##     temprtt (.p6.)    0.205    0.114    1.795    0.073
##   depress ~                                           
##     actuhap (.p7.)   -1.041    1.046   -0.996    0.319
##     temprtt (.p8.)   -6.240    0.911   -6.850    0.000
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   rigoract ~~                                         
##     depress           0.000                           
##   ideahap ~~                                          
##     actuhap           0.037    0.014    2.696    0.007
##   cultatt ~~                                          
##     temperatt         0.265    0.096    2.766    0.006
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     ideahap           0.050    0.009    5.788    0.000
##     actuhap           0.220    0.038    5.788    0.000
##     rigoract          1.051    0.182    5.788    0.000
##     depress          70.889   12.248    5.788    0.000
##     cultatt           1.063    0.184    5.788    0.000
##     temperatt         0.514    0.089    5.788    0.000
## 
## 
## Group 3 [3]:
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   ideahap ~                                           
##     cultatt (.p1.)    0.037    0.020    1.853    0.064
##     temprtt (.p2.)    0.115    0.029    3.991    0.000
##   actuhap ~                                           
##     cultatt (.p3.)    0.106    0.035    3.015    0.003
##     temprtt (.p4.)    0.402    0.051    7.829    0.000
##   rigoract ~                                          
##     ideahap (.p5.)    0.300    0.253    1.185    0.236
##     temprtt (.p6.)    0.205    0.114    1.795    0.073
##   depress ~                                           
##     actuhap (.p7.)   -1.041    1.046   -0.996    0.319
##     temprtt (.p8.)   -6.240    0.911   -6.850    0.000
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   rigoract ~~                                         
##     depress           0.000                           
##   ideahap ~~                                          
##     actuhap           0.053    0.023    2.271    0.023
##   cultatt ~~                                          
##     temperatt         0.092    0.073    1.252    0.211
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     ideahap           0.129    0.020    6.442    0.000
##     actuhap           0.328    0.051    6.442    0.000
##     rigoract          1.095    0.170    6.442    0.000
##     depress          62.504    9.702    6.442    0.000
##     cultatt           0.799    0.124    6.442    0.000
##     temperatt         0.547    0.085    6.442    0.000
anova(avt.group1,avt.group2)
## Chi Square Difference Test
## 
##            Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## avt.group1 15 3662.9 3824.6 15.149                              
## avt.group2 31 3654.2 3762.0 38.476     23.328      16     0.1053

The constrained model is not significantly worse, and it has a lower AIC. So prefer it! The sample sizes in this data set are ‘small’ for SEM, and this is one reason why different models are not significantly different - not enough power to make the distinctions.

Impose equality constraints in unconstrained model

Let us continue with model refinement purely to see how this might be done in the general case. We can start with the unconstrained model, and then impose equality constraints on params. Or we can start with the constrained model, and then relax the equality constraints. Let’s start by imposing equality constraints in the unconstrained model.

#this one constrains cultatt to be the same across both groups, but lets temperatt to be different
avt.model3 = '
  # regressions
    # X + Y ~ U + V + W
    # Effect of cultatt on ideal hap, and on actual hap, same across cultures
    # Effect of temperatt on ideal hap, and on actual hap, different across cultures
    # All other params differ across cultures
    
    ideahap ~ c(k1, k1, k1)*cultatt + c(l1, l2, l3)*temperatt
    actuhap ~ c(k4, k4, k4)*cultatt + c(l4, l5, l6)*temperatt
    rigoract ~ ideahap + temperatt
    depress ~ actuhap + temperatt
    
    # variances and covariances
    # X ~~ Y
    rigoract ~~ 0*depress
    ideahap ~~ actuhap
'
avt.group3 = sem(avt.model3, fixed.x = F, data = d, group = "group", meanstructure = F)
summary(avt.group3)
## lavaan (0.5-20) converged normally after 133 iterations
## 
##                                                   Used       Total
##   Number of observations per group         
##   1                                                 65          75
##   2                                                 67          80
##   3                                                 83          84
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               23.582
##   Degrees of freedom                                19
##   P-value (Chi-square)                           0.213
## 
## Chi-square for each group:
## 
##   1                                             13.449
##   2                                              4.432
##   3                                              5.701
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## 
## Group 1 [1]:
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   ideahap ~                                           
##     cultatt   (k1)    0.035    0.020    1.777    0.076
##     temperatt (l1)    0.183    0.064    2.836    0.005
##   actuhap ~                                           
##     cultatt   (k4)    0.097    0.035    2.806    0.005
##     temperatt (l4)    0.537    0.095    5.650    0.000
##   rigoract ~                                          
##     ideahap           0.687    0.633    1.085    0.278
##     temperatt         0.309    0.352    0.877    0.380
##   depress ~                                           
##     actuhap          -0.280    2.033   -0.138    0.890
##     temperatt        -5.327    1.965   -2.711    0.007
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   rigoract ~~                                         
##     depress           0.000                           
##   ideahap ~~                                          
##     actuhap           0.011    0.017    0.693    0.488
##   cultatt ~~                                          
##     temperatt         0.138    0.076    1.819    0.069
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     ideahap           0.091    0.016    5.701    0.000
##     actuhap           0.195    0.034    5.701    0.000
##     rigoract          2.394    0.420    5.701    0.000
##     depress          55.005    9.649    5.701    0.000
##     cultatt           1.047    0.184    5.701    0.000
##     temperatt         0.340    0.060    5.701    0.000
## 
## 
## Group 2 [2]:
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   ideahap ~                                           
##     cultatt   (k1)    0.035    0.020    1.777    0.076
##     temperatt (l2)    0.115    0.040    2.918    0.004
##   actuhap ~                                           
##     cultatt   (k4)    0.097    0.035    2.806    0.005
##     temperatt (l5)    0.466    0.082    5.710    0.000
##   rigoract ~                                          
##     ideahap           0.593    0.548    1.081    0.280
##     temperatt         0.014    0.188    0.072    0.943
##   depress ~                                           
##     actuhap          -4.514    2.114   -2.136    0.033
##     temperatt        -4.544    1.780   -2.553    0.011
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   rigoract ~~                                         
##     depress           0.000                           
##   ideahap ~~                                          
##     actuhap           0.037    0.014    2.695    0.007
##   cultatt ~~                                          
##     temperatt         0.265    0.096    2.766    0.006
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     ideahap           0.050    0.009    5.788    0.000
##     actuhap           0.219    0.038    5.788    0.000
##     rigoract          1.035    0.179    5.788    0.000
##     depress          68.089   11.764    5.788    0.000
##     cultatt           1.063    0.184    5.788    0.000
##     temperatt         0.514    0.089    5.788    0.000
## 
## 
## Group 3 [3]:
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   ideahap ~                                           
##     cultatt   (k1)    0.035    0.020    1.777    0.076
##     temperatt (l3)    0.071    0.053    1.330    0.183
##   actuhap ~                                           
##     cultatt   (k4)    0.097    0.035    2.806    0.005
##     temperatt (l6)    0.221    0.083    2.668    0.008
##   rigoract ~                                          
##     ideahap           0.118    0.319    0.370    0.711
##     temperatt         0.299    0.157    1.911    0.056
##   depress ~                                           
##     actuhap          -0.079    1.531   -0.051    0.959
##     temperatt        -7.142    1.224   -5.836    0.000
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   rigoract ~~                                         
##     depress           0.000                           
##   ideahap ~~                                          
##     actuhap           0.049    0.023    2.171    0.030
##   cultatt ~~                                          
##     temperatt         0.092    0.073    1.252    0.211
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     ideahap           0.128    0.020    6.442    0.000
##     actuhap           0.311    0.048    6.442    0.000
##     rigoract          1.087    0.169    6.442    0.000
##     depress          61.947    9.616    6.442    0.000
##     cultatt           0.799    0.124    6.442    0.000
##     temperatt         0.547    0.085    6.442    0.000
anova(avt.group1, avt.group3) #debatable whether that's actually a good thing or not
## Chi Square Difference Test
## 
##            Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)  
## avt.group1 15 3662.9 3824.6 15.149                                
## avt.group3 19 3663.3 3811.6 23.582     8.4337       4    0.07692 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Relax equality constraints in constrained model

Here we’ll allow effect of temperatt on ideahap and on actuhap to vary across cultures. The group.partial command overrides the default behavior of the group.equal parameter, letting those particular regressions vary across cultures

avt.group4 = sem(avt.model2, fixed.x = F, data = d, group = "group", meanstructure = F, group.equal = "regressions", group.partial = c("ideahap ~ temperatt", "actuhap ~ temperatt"))
anova(avt.group2, avt.group4) 
## Chi Square Difference Test
## 
##            Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)  
## avt.group4 27 3653.9 3775.2 30.200                                
## avt.group2 31 3654.2 3762.0 38.476     8.2765       4    0.08196 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Again, debatable which is better.